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Amendments to the Specification: 

Please replace the paragraph [0004] with the following amended paragraph: 
[0004] A hardware repair may be relatively simple. For example, a service 

technician replaces the defective component. This repair action usually is 
successful. Software repairs, however, differ from hardware repairs. Software may 
be repaired by restarting some fraction of the system components, but such repair 
attompoto attempts often may fail. Software restarts may be escalated by restarting 
more components. These higher level repairs are often more effective. Multiple 
levels of escalation may exist. 

Please replace the paragraph [0005] with the following amended paragraph: 
[0005] A system may include a large number of distinct software components. 

Each component may have different failure rates and modes, and different ievewte 
Igyejg of restart may have different efficacies. The overall recovery time for a whole 
node is a non-trivial function of the recovery times for all of the Individual software 
components. 

Please replace the paragraph [0006] with the following amended paragraph: 
[0006] Hardware failures may be modeled hierarchically such that the results 
of a complex lower level model can be wrapped up Into a few failure rates in a higher 
level model. Thus, a complex system may be viewed as a rested nested set of 
simpler models. Software tends to have cross-level interactions, and it may be 
necessary to include all of the software components into the higher level models. 
Problems may arise from this practice because the complexity of a model is 
exponential in the number of components that It contains. 

Please replace the paragraph [0009] with the following amended paragraph: 
[0009] According to another embodiment, a method is provided for 
incorporating a software component into a model of a network. The method 
includes determining failure rates for warm recoverable errors and non-warm 
recoverable errors of the software component. The method also includes 
determining the recovery rates for warm recoverable errors and non-warm 
recoverable errors of the software components. The method also includes 
generating warm recoverable error recovery rates. The method also includes 
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generating non-warm recoverable error failure rates and the non-warm recoverable 
error recovery rates. 

Please replace the paragraph [0010] wrth the following amended paragraph: 

[0010] According to another embodiment, a network model of a network 

having at least one node is disclosed. The network model includes a node model for 
the node. The network model also includes node parameters for the node model. 
The node parameters include a reboot time, [[the]] The network model also Includes 
a warm recoverable software error state for the node model. The warm rcovorablo 
reppyerable software error state models warm recoverable software errors of 
software components on the node. The network model also Includes a non-warm 
recoverable software error state for the node [[mode]] model . The non-warm 
recoverable software state models non-warm recoverable software errors of the 
software components on the node. 

Please replace the paragraph [0012] with the following amended paragraph: 

[0012] According to another embodiment, a computer program product 
comprising is provided that includes a computer useable medium having computer 
readable code embodied therein for incorporating a software component into a 
network- The computer program product Jg adapted [[when]] Jo run on a computer to 
effect the following steps. The steps include determining recovery rates for warm 
recoverable errors and non-warm recoverable errors of the software component. 
The steps include generating warm recoverable error state parameters from the 
warm recoverable error failure rates and the warm recoverable error recovery rates. 
The steps include generating non-warm recoverable error state parameters from the 
non-warm recoverable error failure rates and the non-warm recoverable error 
recovery rates. 

Please replace the paragraph [0013] with the following amended paragraph: 

[0013] According to another embodiment, a computer program product 

compricing is provided that includes a computer useable medium having computer 
readable code embodied therein for modeling a software error within a network 
model. The computer program product is adapted [[when]] to run on a computer to 
effect the following steps. The steps include determining a recoverable state for the 
error. The steps also include determining a failure rate for the error. The steps also 
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include determining a recovery rate for the error. The executed steps also include 
incorporating the failure rate and the recovery rate into the recoverable state. 

Please replace the paragraph [0026] with the following amended paragraph: 

[0026] Fig. 2 depicts software component error states in accordance with an 

embodiment of the present invention. The different component error states depicted 
in Fig. 2 correlate to the different types of failures and recovery actions for a 
software application running on a node in network 100, such as software application 
108. The software modeling components also may be used to model operating 
systems on nodes, such as operating system 104. Software applications, however, 
will be referred to in the discussion regarding Fig. 2. 

Please replace the paragraph [0027] with the following amended paragraph: 

[0027] Embodiments of the present invention characterize the behavior of 

Individual software components in a clustered computer system and incorporate 
their combined effects into an understandable and maintainable model without 
losing the different behaviors of the individual software components. Availability 
models may characterize failure events by their implications, and not by their 
causes. The disclosed embodiments adopt this approach and distinguishes four 
classes of failures. The four classes may capture a large share of failure behavior. 
The classes may be intuitive and the associated parameters may be reasonably 
measurable or estimatable. The parameters of [[the]] these classes may be 
meaningfully summable. 

Please replace the paragraph [0031] with the following amended paragraph: 

[0031] The recovery rate for software component soft reset state 202 includes 

an error detect time and a recovery time to resolve the failure. For example, the 
recovery rate may be the time to detect the application failure and to soft reset the 
application. This rate may be known as mu-sw-csr. Preferably < mu ow cor may bo 
g r oator thon or oqual to about 1 Hz. Software component soft reset state 202 also 
includes a value for the fraction of repair failures. This value would model for 
recovery actions that are not effective in resolving the application failure, such as 
misdiagnosis of the failure, a corruption in the checkpoint stored for the application, 
miscellaneous failures to restart and the like. The fraction of recovery failures value 
may be known as f-csr-fail. 
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Please replace the paragraph [0033] with the following amended paragraph: 

[0033] The recovery rate for software component warm restart state 204 

includes an error detect time and a recover time to resolve the failure. For example, 
the recovery rate may be the time to detect the application failure and to warm 
restart the application. This rate may be known as mu-sw-cwr. Preferably, mu cw 
owr may b e in tho range of about .3 Hz to about .6 Hz. Software component warm 
restart state 204 also includes a value for the fraction of recovery failures. This 
value would model recovery actions that are not effective in resolving the application 
failure, such as misdiagnosis of the failure, a corruption in the checkpoint stored for 
the application, miscellaneous failures to restart and the like. The fraction of 
recovery failures value may be known as f-cwr-fail. 

Please replace the paragraph (0035] with the following amended paragraph: 

[0035] The recovery rate for software component cold restart state 206 

includes an error detect time and a recover time to resolve the failure. For example, 
the recovery rate may be the time to detect the application failure and to cold restart 
the application. This rate may be known as mu-sw-ccr. Pr e ferab l y, mu sw cor may 
bo i n tho rango of about .3 Hz to about .6 Hz. Software component cold restart state 
206 also includes a value for the fraction of recovery failures. This value would 
serve to model recovery actions that are not effective in resolving the application 
failure, such as misdiagnosis of the failure, miscellaneous failures to restart and the 
like. The fraction of recovery failures value may be known as f-ccr-fail. 

Please replace the paragraph [0037] with the following amended paragraph: 

[0037] The recovery rate for software component fail-over model 208 includes 

an error detect time and recover time to resolve the failure. For example, the 
recovery rate may be the time to detect the application failure and to reboot the 
node. This rate may be known as mu-sw-cfo. Preferably, mu cw cfo may bo in tho 
rango of about .3 Hz to about 1 Hzr Software component fail-over state 208 also 
includes a vafue for the fraction of recovery failures. This value would serve to 
model recovery actions that are not effective in resolving the application failure, such 
as corruptions in the checkpoints, miscellaneous failures to restart and the like. The 
fraction of recovor recovery failures value may be known as f-cfo-fail. 

Please replace the paragraph [0039] with the following amended paragraph: 

VUBO - 80168*0106 * 170000 vl 

5 

PA(£ 6/18 * RCVD AT 1 1/10/2004 7:21:10 PM [Eastern Standard Time] * SVR:USPTO-ff XRF-113 * DNIS:8729306 1 CSID: * DURATION (mm-s$):05-22 



Nov-10-04 05:21pm Frca-HOGAN & HARTSON 



WrO P. 007/018 F-651 



Serial No. 09/850,183 

Reply to Office Action of September 9, 2004 



[0039] An analogous approach to that usefl for application failures may be 

used to model operating system failures. An operating system affects a large 
number of operations, and the operating systems on the various nodes cooperate. 
Slightly different failure classes may be assigned to an operating system failure. 
The first class may be problems requiring a single node reboot The second class 
may be problems requiring a reboot of the entire cluster. The third class may be 
problems requiring service. 

Please replace the paragraph [0041] with the following amended paragraph: 

[0041] Software component node reboot state 21 0 may be characterized by a 

reboot rate known as mu-node-reboot The reboot rate may reflect that time Is takes 
to reboot the affected node, and bring all the node components back on-line. 
Preferably, mu node s- reboot may bo from about .05 Hz to about .3 Hrs. Software 
component node reboot state 210 also includes a value for the fraction of reboot 
failures. This value would serve to model reboots that are not effective in resolving 
the application failure, such as damage not confined to one node, miscellaneous 
failures to reboot and the like. The fraction of reboot failure value may be known as 
f-nr-fail. 

Please replace the paragraph [0042] with the following amended paragraph: 

[0042] Software component cluster reboot state 212 may reflect those errors 

that are not resolved by any of the above-disclosed models[[,]] and result in an entire 
network cluster reboot. If a node reboot is Ineffective, a cluster reboot may be 
performed. A nodo roboot hos not boon offeot i vo in resolv i ng tho orror, A cluster 
reboot involves a shutdown and reboot of all computers in the cluster. An error or 
failure impacting multiple nodes may be remedied by the cluster reboot. The rate of 
cluster reboots may be characterized by the time it takes to reboot the cluster 
networkO and may be known as mu-cluster-reboot. Software component cluster 
reboot state 212 and software component node reboot state 210 may be 
characterized by platform-specific parameters. Platform-specific parameters 
indicate that the errors are not confined to a software application^]] and mooourop 
Inflate that actions outside of restarting the application need to be taken. 

Please replace the paragraph [0048] with the following amended paragraph: 

[0048] According to an embodiment, the time parameters determined above 
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may be combined with the time-sw-ccr parameters of the application components in 
order to generate the node and cluster reboot rates. By incorporating application * 
restart times into node restart times, a platform specific summation formula is 
determined that accounts for the plausible degrees of parallelism/serialization within 
the network. 

Please replace the paragraph [0049) with the following amended paragraph: 

[0049] Because [[of]] the fail-over of whole nodes may occur rather than 

Individual software components, an aggregate node fall-over time is computed. The 
aggregate node fail-over time may be a platform specific summation of the 
component fail-over times for all the software components on a node. As noted 
above, these failure rates and recovery rates may be used to determine parameters 
for a single software failure model for a particular platform. 

Please replace the paragraph [0058] with the following amended paragraph: 

[0058] Fig. 5 depicts a flowchart for constructing a software availability model 

in accordance with an embodiment of the present invention. Step 500 executes by 
determining whether a component to be modeled is a software application or part of 
the operating system. If [[no]] not part of the operating system, then step 502 
executes by estimating/measuring the failure rate, repair time and efficacy value for 
the warm reset state. Step 504 executes by estimating/measuring the failure rate, 
repair time and efficacy value for the warm restart state. Step 506 executes by 
estimating/measuring the failure rate, repair time and efficacy value for the cold 
restart state. Step 508 executes by estimating/measuring the failure rate, repair time 
and efficacy value for the fail-over state. 

Please replace the paragraph [0060] with the following amended paragraph: 

[0060] If step 500 [[is yes]] indicates the component to be modeled is part of 

the operating system, then step 516 executes by estimating/measuring the node 
reboot failure rate, repair time and efficacy value. Step 518 executes by 
estimating/measuring the cluster restart failure rate and repair time. Step 520 
executes by computing a node reboot repair rate from a platform-specific sum of the 
operating system times and software component cold restart times. 
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